-
Notifications
You must be signed in to change notification settings - Fork 255
feat: Add optional prefetch hint for parsing Puffin Footer #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, I believe this hint should be optional and we shouldn't require all users to provide it. |
@Xuanwo It is optional, let me change the title. Or do you mean that i should add a function like |
Yep. I believe we should not change the |
@Xuanwo Thanks for the review, it is fixed! |
pub struct FileMetadata { | ||
pub(crate) blobs: Vec<BlobMetadata>, | ||
#[serde(skip_serializing_if = "HashMap::is_empty")] | ||
#[serde(default)] | ||
pub(crate) properties: HashMap<String, String>, | ||
/// Optional prefetch hint you can pass for retrieving footer | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
pub(crate) prefetch_hint: Option<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns adding this field for two reasons:
- It looks impractical to ask user to pass in the
prefetch_hint
. - This format is define by iceberg spec, and adding another field may lead to wrong file.
/// `prefetch_hint` is used to try to fetch the entire footer in one read. If | ||
/// the entire footer isn't fetched in one read the function will call the `read_no_prefetch` | ||
/// option. | ||
pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { | |
pub(crate) async fn read(input_file: &InputFile) -> Result<FileMetadata> { |
Suggestions:
- Keep original api
- Add anoter method
read_with_hint
to implement the method with a hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liurenjie1024 fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for this pr, generally looks good, some nit!
|
||
// Hint cannot be larger than input file | ||
if prefetch_hint as u64 > input_file_length { | ||
return FileMetadata::read(input_file).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add some warning log here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets push this pull request forward and do this in a separate pull request (good first issue). Are we going to be using tracing
in this crate?
+ FileMetadata::FOOTER_STRUCT_LENGTH as usize | ||
+ FileMetadata::MAGIC_LENGTH as usize; | ||
if footer_length > prefetch_hint as usize { | ||
return FileMetadata::read(input_file).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
/// `prefetch_hint` is used to try to fetch the entire footer in one read. If | ||
/// the entire footer isn't fetched in one read the function will call the `read_no_prefetch` | ||
/// option. | ||
pub(crate) async fn read(&self, input_file: &InputFile) -> Result<FileMetadata> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liurenjie1024 fixed!
|
||
// Hint cannot be larger than input file | ||
if prefetch_hint as u64 > input_file_length { | ||
return FileMetadata::read(input_file).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets push this pull request forward and do this in a separate pull request (good first issue). Are we going to be using tracing
in this crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for this pr, LGTM!
@@ -787,7 +867,7 @@ mod tests { | |||
.await; | |||
|
|||
assert_eq!( | |||
FileMetadata::read(&input_file).await.unwrap_err().to_string(), | |||
FileMetadata::read(&input_file, ).await.unwrap_err().to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, forgot to delete. Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Which issue does this PR close?
What changes are included in this PR?
Add prefetch hint for parsing puffin footer + change some docs
Are these changes tested?
Yes, unit tested